DiVA: Using Application-Specific Policies to 'Dive' into Vector Approximations

نویسندگان

  • Konstantinos Tsakalozos
  • Spiros Evangelatos
  • Fotis Psallidas
  • Marcos R. Vieira
  • Vassilis J. Tsotras
  • Alex Delis
چکیده

In high-dimensional data domains, the performance of conventional tree-based access structures is occasionally outperformed by simple sequential scans. To this end, the introduction of approximation-based methods helped speed-up queries by providing compact representations of stored data. Approximation methods exploit vector quantization to index data mainly presumed to follow a uniform distribution. In real-world environments however, we mostly encounter both skewed data and query distributions. To address this dual challenge, we propose DiVA that combines the selective use of an approximation approach with an indexing mechanism to organize data subspaces in a high fan-out hierarchical structure. Moreover, DiVA reorganizes its own elements after receiving application hints regarding data access patterns. These hints or policies trigger the restructuring and possible expansion of DiVA so as to offer finer indexing granularity and improved access times in subspaces emerging as ‘hot-spots’. The novelty of our approach lies in the self-organizing nature of DiVA driven by application-provided policies; the latter effectively guide the refinement of DiVA’s elements as new data arrive, existing data are updated and the nature of query workloads continually changes. An extensive experimental evaluation using real data shows that DiVA reduces up-to 64% of the total number of I/Os if compared with state-of-art methods including the VA-file, GC-tree and A-tree.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DiVA: Indexing high-dimensional data by "diving" into vector approximations

Contemporary multimedia, scientific and medical applications use indexing structures to access their highdimensional data. Yet, in sufficiently high-dimensional spaces, conventional tree-based access methods are eventually outperformed by simple serial scans. Vector quantization has been effectively used to index data that are mostly distributed uniformly. However, in real-world applications, c...

متن کامل

Distributed vector architectures

Integrating processors and main memory is a promising approach to increase system performance. Such integration provides very high memory bandwidth that can be exploited efficiently by vector operations. However, traditional vector applications would easily overflow the limited memory of a single integrated node. To accommodate such workloads, we propose the DIstributed Vector Architecture (DIV...

متن کامل

The Nondeterministic Divide

The noadeterministic divide partitions a vector into two nonempty slices by allowing the point of division to be chosen nondeterministically. Support for high-level divide-and-conquer programming provided by the nondeterministic divide is investigated. A diva algorithm is a recursive divide-andconquer sequential algorithm on one or more vectors of the same range, whose division point for a new ...

متن کامل

Corruption – Taking a Deeper Dive; Comment on “We Need to Talk About Corruption in Health Systems”

This commentary while agreeing broadly with the points raised by the editorial by McKee et al, seeks to broaden and deepen those arguments. The commentary contends that unless we understand corruption as deeply embedded in and propping up systems of power differentials, we will not be able to design interventions that will tackle corruption at its roots. The commentary further points to the con...

متن کامل

The Application of Least Square Support Vector Machine as a Mathematical Algorithm for Diagnosing Drilling Effectivity in Shaly Formations

The problem of slow drilling in deep shale formations occurs worldwide causing significant expenses to the oil industry. Bit balling which is widely considered as the main cause of poor bit performance in shales, especially deep shales, is being drilled with water-based mud. Therefore, efforts have been made to develop a model to diagnose drilling effectivity. Hence, we arrived at graphical cor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Comput. J.

دوره 59  شماره 

صفحات  -

تاریخ انتشار 2016